Overview

Dataset statistics

Number of variables14
Number of observations4600
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory503.2 KiB
Average record size in memory112.0 B

Variable types

Numeric11
Categorical3

Alerts

price is highly correlated with sqft_living and 1 other fieldsHigh correlation
bedrooms is highly correlated with bathrooms and 1 other fieldsHigh correlation
bathrooms is highly correlated with bedrooms and 4 other fieldsHigh correlation
sqft_living is highly correlated with price and 3 other fieldsHigh correlation
floors is highly correlated with yr_builtHigh correlation
sqft_above is highly correlated with price and 4 other fieldsHigh correlation
yr_built is highly correlated with bathrooms and 3 other fieldsHigh correlation
condition is highly correlated with yr_builtHigh correlation
sqft_basement is highly correlated with bathrooms and 2 other fieldsHigh correlation
yr_renovated is highly correlated with yr_builtHigh correlation
price is highly skewed (γ1 = 24.79093256) Skewed
price has 49 (1.1%) zeros Zeros
sqft_basement has 2745 (59.7%) zeros Zeros
yr_renovated has 2735 (59.5%) zeros Zeros
city has 123 (2.7%) zeros Zeros

Reproduction

Analysis started2022-11-21 06:48:24.217965
Analysis finished2022-11-21 06:48:41.505127
Duration17.29 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

price
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct1741
Distinct (%)37.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean551962.9885
Minimum0
Maximum26590000
Zeros49
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:41.598275image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile200000
Q1322875
median460943.4615
Q3654962.5
95-th percentile1184050
Maximum26590000
Range26590000
Interquartile range (IQR)332087.5

Descriptive statistics

Standard deviation563834.7025
Coefficient of variation (CV)1.021508171
Kurtosis1044.352151
Mean551962.9885
Median Absolute Deviation (MAD)157500
Skewness24.79093256
Sum2539029747
Variance3.179095718 × 1011
MonotonicityNot monotonic
2022-11-21T12:18:41.718759image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
049
 
1.1%
30000042
 
0.9%
40000031
 
0.7%
44000029
 
0.6%
45000029
 
0.6%
60000029
 
0.6%
35000028
 
0.6%
25000027
 
0.6%
43500027
 
0.6%
41500027
 
0.6%
Other values (1731)4282
93.1%
ValueCountFrequency (%)
049
1.1%
78001
 
< 0.1%
800001
 
< 0.1%
830001
 
< 0.1%
833002
 
< 0.1%
843501
 
< 0.1%
875001
 
< 0.1%
900002
 
< 0.1%
1000004
 
0.1%
1025001
 
< 0.1%
ValueCountFrequency (%)
265900001
< 0.1%
128990001
< 0.1%
70625001
< 0.1%
46680001
< 0.1%
44890001
< 0.1%
38000001
< 0.1%
37100001
< 0.1%
32000001
< 0.1%
31000001
< 0.1%
30000001
< 0.1%

bedrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.400869565
Minimum0
Maximum9
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:41.815547image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q13
median3
Q34
95-th percentile5
Maximum9
Range9
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9088481155
Coefficient of variation (CV)0.2672399215
Kurtosis1.235377429
Mean3.400869565
Median Absolute Deviation (MAD)1
Skewness0.456446633
Sum15644
Variance0.8260048971
MonotonicityNot monotonic
2022-11-21T12:18:41.891281image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
32032
44.2%
41531
33.3%
2566
 
12.3%
5353
 
7.7%
661
 
1.3%
138
 
0.8%
714
 
0.3%
82
 
< 0.1%
02
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
02
 
< 0.1%
138
 
0.8%
2566
 
12.3%
32032
44.2%
41531
33.3%
5353
 
7.7%
661
 
1.3%
714
 
0.3%
82
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
91
 
< 0.1%
82
 
< 0.1%
714
 
0.3%
661
 
1.3%
5353
 
7.7%
41531
33.3%
32032
44.2%
2566
 
12.3%
138
 
0.8%
02
 
< 0.1%

bathrooms
Real number (ℝ≥0)

HIGH CORRELATION

Distinct26
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.160815217
Minimum0
Maximum8
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:41.981899image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11.75
median2.25
Q32.5
95-th percentile3.5
Maximum8
Range8
Interquartile range (IQR)0.75

Descriptive statistics

Standard deviation0.7837810747
Coefficient of variation (CV)0.3627247107
Kurtosis1.86590471
Mean2.160815217
Median Absolute Deviation (MAD)0.5
Skewness0.6160327234
Sum9939.75
Variance0.614312773
MonotonicityNot monotonic
2022-11-21T12:18:42.077734image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
2.51189
25.8%
1743
16.2%
1.75629
13.7%
2427
 
9.3%
2.25419
 
9.1%
1.5291
 
6.3%
2.75276
 
6.0%
3167
 
3.6%
3.5162
 
3.5%
3.25136
 
3.0%
Other values (16)161
 
3.5%
ValueCountFrequency (%)
02
 
< 0.1%
0.7517
 
0.4%
1743
16.2%
1.253
 
0.1%
1.5291
 
6.3%
1.75629
13.7%
2427
 
9.3%
2.25419
 
9.1%
2.51189
25.8%
2.75276
 
6.0%
ValueCountFrequency (%)
81
 
< 0.1%
6.751
 
< 0.1%
6.51
 
< 0.1%
6.252
 
< 0.1%
5.751
 
< 0.1%
5.54
 
0.1%
5.254
 
0.1%
56
 
0.1%
4.757
 
0.2%
4.529
0.6%

sqft_living
Real number (ℝ≥0)

HIGH CORRELATION

Distinct566
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2139.346957
Minimum370
Maximum13540
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:42.190204image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum370
5-th percentile950
Q11460
median1980
Q32620
95-th percentile3870
Maximum13540
Range13170
Interquartile range (IQR)1160

Descriptive statistics

Standard deviation963.2069158
Coefficient of variation (CV)0.4502340833
Kurtosis8.2916826
Mean2139.346957
Median Absolute Deviation (MAD)570
Skewness1.723513271
Sum9840996
Variance927767.5626
MonotonicityNot monotonic
2022-11-21T12:18:42.310516image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
194032
 
0.7%
172032
 
0.7%
166031
 
0.7%
184031
 
0.7%
200030
 
0.7%
141029
 
0.6%
120028
 
0.6%
148028
 
0.6%
170027
 
0.6%
149027
 
0.6%
Other values (556)4305
93.6%
ValueCountFrequency (%)
3701
< 0.1%
3801
< 0.1%
4201
< 0.1%
4301
< 0.1%
4901
< 0.1%
5201
< 0.1%
5501
< 0.1%
5601
< 0.1%
5801
< 0.1%
5902
< 0.1%
ValueCountFrequency (%)
135401
< 0.1%
100401
< 0.1%
96401
< 0.1%
86701
< 0.1%
80201
< 0.1%
73201
< 0.1%
72701
< 0.1%
70501
< 0.1%
69801
< 0.1%
69001
< 0.1%

sqft_lot
Real number (ℝ≥0)

Distinct3113
Distinct (%)67.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14852.51609
Minimum638
Maximum1074218
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:42.430199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum638
5-th percentile1690.8
Q15000.75
median7683
Q311001.25
95-th percentile43560
Maximum1074218
Range1073580
Interquartile range (IQR)6000.5

Descriptive statistics

Standard deviation35884.43614
Coefficient of variation (CV)2.416050987
Kurtosis219.8729874
Mean14852.51609
Median Absolute Deviation (MAD)2772
Skewness11.30713875
Sum68321574
Variance1287692757
MonotonicityNot monotonic
2022-11-21T12:18:42.548796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500080
 
1.7%
600065
 
1.4%
400054
 
1.2%
720050
 
1.1%
480029
 
0.6%
450025
 
0.5%
960025
 
0.5%
300023
 
0.5%
550023
 
0.5%
750023
 
0.5%
Other values (3103)4203
91.4%
ValueCountFrequency (%)
6381
< 0.1%
6811
< 0.1%
7041
< 0.1%
7461
< 0.1%
7471
< 0.1%
7501
< 0.1%
7791
< 0.1%
8331
< 0.1%
8351
< 0.1%
8442
< 0.1%
ValueCountFrequency (%)
10742181
< 0.1%
6412031
< 0.1%
4782881
< 0.1%
4356002
< 0.1%
4238381
< 0.1%
3891261
< 0.1%
3271351
< 0.1%
3077521
< 0.1%
3068481
< 0.1%
2840111
< 0.1%

floors
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.512065217
Minimum1
Maximum3.5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:42.643406image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1.5
Q32
95-th percentile2
Maximum3.5
Range2.5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.5382883773
Coefficient of variation (CV)0.3559954763
Kurtosis-0.5388519795
Mean1.512065217
Median Absolute Deviation (MAD)0.5
Skewness0.5514406463
Sum6955.5
Variance0.2897543771
MonotonicityNot monotonic
2022-11-21T12:18:42.720992image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
12174
47.3%
21811
39.4%
1.5444
 
9.7%
3128
 
2.8%
2.541
 
0.9%
3.52
 
< 0.1%
ValueCountFrequency (%)
12174
47.3%
1.5444
 
9.7%
21811
39.4%
2.541
 
0.9%
3128
 
2.8%
3.52
 
< 0.1%
ValueCountFrequency (%)
3.52
 
< 0.1%
3128
 
2.8%
2.541
 
0.9%
21811
39.4%
1.5444
 
9.7%
12174
47.3%

waterfront
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
0
4567 
1
 
33

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Length

2022-11-21T12:18:42.802855image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-21T12:18:42.888277image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring characters

ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04567
99.3%
133
 
0.7%

view
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
0
4140 
2
 
205
3
 
116
4
 
70
1
 
69

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row4
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Length

2022-11-21T12:18:42.955842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-21T12:18:43.040579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring characters

ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
04140
90.0%
2205
 
4.5%
3116
 
2.5%
470
 
1.5%
169
 
1.5%

condition
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size36.1 KiB
3
2875 
4
1252 
5
435 
2
 
32
1
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4600
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row5
3rd row4
4th row4
5th row4

Common Values

ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Length

2022-11-21T12:18:43.113431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-21T12:18:43.200662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring characters

ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number4600
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common4600
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4600
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
32875
62.5%
41252
27.2%
5435
 
9.5%
232
 
0.7%
16
 
0.1%

sqft_above
Real number (ℝ≥0)

HIGH CORRELATION

Distinct511
Distinct (%)11.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1827.265435
Minimum370
Maximum9410
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:43.299800image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum370
5-th percentile860
Q11190
median1590
Q32300
95-th percentile3440
Maximum9410
Range9040
Interquartile range (IQR)1110

Descriptive statistics

Standard deviation862.168977
Coefficient of variation (CV)0.4718356515
Kurtosis4.070138265
Mean1827.265435
Median Absolute Deviation (MAD)490
Skewness1.494210748
Sum8405421
Variance743335.3448
MonotonicityNot monotonic
2022-11-21T12:18:43.410500image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120047
 
1.0%
101047
 
1.0%
130045
 
1.0%
114044
 
1.0%
132043
 
0.9%
115042
 
0.9%
109040
 
0.9%
118040
 
0.9%
140038
 
0.8%
105037
 
0.8%
Other values (501)4177
90.8%
ValueCountFrequency (%)
3701
 
< 0.1%
3801
 
< 0.1%
4201
 
< 0.1%
4301
 
< 0.1%
4901
 
< 0.1%
5201
 
< 0.1%
5503
0.1%
5601
 
< 0.1%
5801
 
< 0.1%
5902
< 0.1%
ValueCountFrequency (%)
94101
< 0.1%
80201
< 0.1%
76801
< 0.1%
73201
< 0.1%
66401
< 0.1%
64301
< 0.1%
64201
< 0.1%
61201
< 0.1%
60701
< 0.1%
60501
< 0.1%

sqft_basement
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct207
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean312.0815217
Minimum0
Maximum4820
Zeros2745
Zeros (%)59.7%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:43.892907image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3610
95-th percentile1210
Maximum4820
Range4820
Interquartile range (IQR)610

Descriptive statistics

Standard deviation464.1372281
Coefficient of variation (CV)1.487230726
Kurtosis4.082380024
Mean312.0815217
Median Absolute Deviation (MAD)0
Skewness1.642732192
Sum1435575
Variance215423.3665
MonotonicityNot monotonic
2022-11-21T12:18:44.011436image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02745
59.7%
50053
 
1.2%
60045
 
1.0%
80043
 
0.9%
90041
 
0.9%
70038
 
0.8%
100033
 
0.7%
40033
 
0.7%
55027
 
0.6%
75026
 
0.6%
Other values (197)1516
33.0%
ValueCountFrequency (%)
02745
59.7%
201
 
< 0.1%
501
 
< 0.1%
602
 
< 0.1%
651
 
< 0.1%
701
 
< 0.1%
803
 
0.1%
902
 
< 0.1%
10014
 
0.3%
1102
 
< 0.1%
ValueCountFrequency (%)
48201
< 0.1%
41301
< 0.1%
28501
< 0.1%
27301
< 0.1%
25502
< 0.1%
23601
< 0.1%
23301
< 0.1%
23001
< 0.1%
22001
< 0.1%
21801
< 0.1%

yr_built
Real number (ℝ≥0)

HIGH CORRELATION

Distinct115
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1970.786304
Minimum1900
Maximum2014
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:44.131533image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile1913
Q11951
median1976
Q31997
95-th percentile2009
Maximum2014
Range114
Interquartile range (IQR)46

Descriptive statistics

Standard deviation29.73184839
Coefficient of variation (CV)0.0150862873
Kurtosis-0.6700759004
Mean1970.786304
Median Absolute Deviation (MAD)23
Skewness-0.50215519
Sum9065617
Variance883.9828087
MonotonicityNot monotonic
2022-11-21T12:18:44.247718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2006111
 
2.4%
2005104
 
2.3%
200793
 
2.0%
200492
 
2.0%
197890
 
2.0%
200389
 
1.9%
200889
 
1.9%
196782
 
1.8%
197780
 
1.7%
201478
 
1.7%
Other values (105)3692
80.3%
ValueCountFrequency (%)
190022
0.5%
19019
 
0.2%
190210
 
0.2%
190310
 
0.2%
19049
 
0.2%
190519
0.4%
190627
0.6%
190712
0.3%
190819
0.4%
190922
0.5%
ValueCountFrequency (%)
201478
1.7%
201357
1.2%
201233
 
0.7%
201124
 
0.5%
201028
 
0.6%
200950
1.1%
200889
1.9%
200793
2.0%
2006111
2.4%
2005104
2.3%

yr_renovated
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct60
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean808.6082609
Minimum0
Maximum2014
Zeros2735
Zeros (%)59.5%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:44.377157image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31999
95-th percentile2011
Maximum2014
Range2014
Interquartile range (IQR)1999

Descriptive statistics

Standard deviation979.4145364
Coefficient of variation (CV)1.211234888
Kurtosis-1.851110913
Mean808.6082609
Median Absolute Deviation (MAD)0
Skewness0.3859187009
Sum3719598
Variance959252.8341
MonotonicityNot monotonic
2022-11-21T12:18:44.491931image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02735
59.5%
2000170
 
3.7%
2003151
 
3.3%
2009109
 
2.4%
2001109
 
2.4%
200595
 
2.1%
200477
 
1.7%
201472
 
1.6%
200668
 
1.5%
201361
 
1.3%
Other values (50)953
 
20.7%
ValueCountFrequency (%)
02735
59.5%
191233
 
0.7%
19131
 
< 0.1%
192357
 
1.2%
19346
 
0.1%
19457
 
0.2%
19481
 
< 0.1%
19531
 
< 0.1%
19548
 
0.2%
19552
 
< 0.1%
ValueCountFrequency (%)
201472
1.6%
201361
1.3%
201245
1.0%
201154
1.2%
201030
 
0.7%
2009109
2.4%
200845
1.0%
20077
 
0.2%
200668
1.5%
200595
2.1%

city
Real number (ℝ≥0)

ZEROS

Distinct44
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.546521739
Minimum0
Maximum43
Zeros123
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size36.1 KiB
2022-11-21T12:18:44.610480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median4
Q314
95-th percentile27
Maximum43
Range43
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.162660506
Coefficient of variation (CV)1.072092342
Kurtosis0.8150428561
Mean8.546521739
Median Absolute Deviation (MAD)3
Skewness1.21114425
Sum39314
Variance83.95434754
MonotonicityNot monotonic
2022-11-21T12:18:44.716193image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
11573
34.2%
18293
 
6.4%
3286
 
6.2%
4235
 
5.1%
14187
 
4.1%
13187
 
4.1%
2185
 
4.0%
9176
 
3.8%
8175
 
3.8%
12148
 
3.2%
Other values (34)1155
25.1%
ValueCountFrequency (%)
0123
 
2.7%
11573
34.2%
2185
 
4.0%
3286
 
6.2%
4235
 
5.1%
596
 
2.1%
650
 
1.1%
736
 
0.8%
8175
 
3.8%
9176
 
3.8%
ValueCountFrequency (%)
432
 
< 0.1%
422
 
< 0.1%
411
 
< 0.1%
406
 
0.1%
391
 
< 0.1%
3828
0.6%
3711
 
0.2%
3629
0.6%
354
 
0.1%
3429
0.6%

Interactions

2022-11-21T12:18:40.072598image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:28.667059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.771796image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.833982image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.974099image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.121126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.232405image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.680505image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.752293image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.829943image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.922348image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.168007image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:28.778560image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.866370image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.936530image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.076024image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.219155image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.339417image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.775677image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.849678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.929417image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.023846image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.258764image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:28.876493image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.955096image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.032327image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.177453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.311199image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.436879image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.870235image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.945788image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.027203image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.121127image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.358429image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:28.984765image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.062927image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.141105image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.284594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.418719image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.544783image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.973051image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.050833image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.132794image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.230885image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.477895image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.086202image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.167906image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.249134image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.391055image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.521028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.654480image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.075208image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.152618image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.235252image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.340609image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.574541image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.187166image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.261075image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.354218image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.495453image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.626158image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.755332image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.172534image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.245813image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.330861image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.439291image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.675556image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.284576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.359041image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.456784image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.600957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.727795image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.166761image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.271273image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.338688image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.429058image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.543991image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.771087image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.379280image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.451958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.559651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.700284image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.824315image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.279172image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.365151image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.434170image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.527250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.645698image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.865091image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.480269image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.545048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.666313image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.803932image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.925728image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.378797image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.462536image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.534648image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.622342image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.750898image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:40.959544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.578664image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.640110image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.766849image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:32.910715image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.026544image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.478733image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.558446image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.631530image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.723016image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.858336image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:41.058438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:29.681028image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:30.741967image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:31.878572image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:33.025768image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:34.134026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:35.586223image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:36.657834image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:37.738697image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:38.826323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-11-21T12:18:39.973172image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-11-21T12:18:44.815898image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-21T12:18:44.964438image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-21T12:18:45.105823image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-21T12:18:45.247847image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-21T12:18:45.389565image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-11-21T12:18:45.501723image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-21T12:18:41.216576image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-21T12:18:41.427470image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditionsqft_abovesqft_basementyr_builtyr_renovatedcity
0313000.03.01.50134079121.500313400195520050
12384000.05.02.50365090502.00453370280192101
2342000.03.02.001930119471.000419300196602
3420000.03.02.25200080301.000410001000196303
4550000.04.02.501940105001.00041140800197619924
5490000.02.01.0088063801.00038800193819941
6335000.02.02.00135025601.000313500197604
7482000.04.02.502710358682.000327100198905
8452500.03.02.502430884261.00041570860198506
9640000.04.02.00152062001.500315200194520101

Last rows

pricebedroomsbathroomssqft_livingsqft_lotfloorswaterfrontviewconditionsqft_abovesqft_basementyr_builtyr_renovatedcity
4590380680.5555564.02.50262083312.0003262001991018
4591396166.6666673.01.75188057521.0004940940194501
4592252980.0000004.02.50253081692.0003253001993012
4593289373.3076923.02.50253846002.000325380201319239
4594210614.2857143.02.50161072232.000316100199402
4595308166.6666673.01.75151063601.000415100195419791
4596534333.3333333.02.50146075732.000314600198320093
4597416904.1666673.02.50301070142.0003301002009018
4598203400.0000004.02.00209066301.000310701020197401
4599220600.0000003.02.50149081022.0004149001990023